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Abstract 

Background: Halophiles are extremophilic microorganisms growing optimally at high salt concentrations. There 
are two strategies used by halophiles to maintain proper osmotic pressure in their cytoplasm: accumulation of 
molar concentrations of potassium and chloride with extensive adaptation of the intracellular macromolecules 
("salt-in" strategy) or biosynthesis and/or accumulation of organic osmotic solutes ("osmolyte" strategy). Our work 
was aimed at contributing to the understanding of the shared molecular mechanisms of protein haloadaptation 
through a detailed and systematic comparison of a sample of several three-dimensional structures of halophilic 
and non-halophilic proteins. Structural differences observed between the "salt-in" and the mesophilic homologous 
proteins were contrasted to those observed between the "osmolyte" and mesophilic pairs. 

Results: The results suggest that haloadaptation strategy in the presence of molar salt concentration, but not of 
osmolytes, necessitates a weakening of the hydrophobic interactions, in particular at the level of conserved 
hydrophobic contacts. Weakening of these interactions counterbalances their strengthening by the presence of 
salts in solution and may help the structure preventing aggregation and/or loss of function in hypersaline 
environments. 

Conclusions: Considering the significant increase of biotechnology applications of halophiles, the understanding 
of halophilicity can provide the theoretical basis for the engineering of proteins of great interest because stable at 
concentrations of salts that cause the denaturation or aggregation of the majority of macromolecules. 



Background 

Organisms thriving in "extreme environments", such as 
thermophiles, alkalophiles, acidophiles, halophiles, piezo- 
philes and psychrophiles, have drawn much interest in 
the scientific community because of the molecular adap- 
tation they underwent during evolution and for their 
biotechnological potential [1-4]. The environmental 
challenges that extremophilic organisms have to face 
necessitate, besides other physiological modifications, 
biosynthesis of macromolecules stable and active at 
environmental physical-chemical extreme conditions. 
These macromolecules display clearly distinguished fea- 
tures when compared to the macromolecules from 
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microorganisms found in the "normal" (mesophilic) 
environments [5]. 

Halophilic microorganisms are salt-loving extremophi- 
lic organisms that grow optimally at high salt concentra- 
tions. They were found [6] mainly in marine salterns 
and hypersaline lakes, such as the Great Salt Lake and 
the Dead Sea. A survey of the salt requirements in the 
microbial world shows a continuum of properties which 
makes it very difficult to define by sharp limits what a 
halophilic microorganism is. The accepted view [6] dis- 
tinguishes the halophilic organisms in: extreme halo- 
philes (growing best in media containing 2.5-5.2 M salt), 
borderline extreme halophiles (growing best is media 
containing 1.5-4.0 M salt), moderate halophiles (growing 
best in media containing 0.5-2.5 M salt), and halotoler- 
ant microorganisms that do not show an absolute 
requirement for salt for growth but grow well up to 
often very high salt concentrations (considered 
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extremely halotolerant if the growth range extends 
above 2.5 M salt). 

Distribution of halophilic microorganisms within the 
tree of life shows that they are widespread in the bacter- 
ial and Archaea kingdoms. Eukaryotic halophilic micro- 
organisms, such as fungi and algae, are also known [7]. 

All halophilic microorganisms share a basic property: 
their cytoplasm must be at least isoosmotic with their 
surrounding medium. There are two strategies used by 
halophilic microorganisms to maintain proper osmotic 
pressure in their cytoplasm. The first involves accumula- 
tion of molar concentrations of potassium and chloride. 
This strategy requires extensive adaptation of the intra- 
cellular enzymatic machinery to the presence of salts, as 
the proteins should maintain their proper conformation 
and activity at near-saturating salt concentrations. The 
proteome of such organisms is highly acidic, and most 
proteins denature when suspended in low salt. Gener- 
ally, the microorganisms relying on such "salt-in" strat- 
egy are obligate halophilic Archaea. The other strategy 
of haloadaptation is based on the biosynthesis and/or 
accumulation of organic osmotic solutes (osmolytes) 
such as ectoin, glycine betaine or others [8]. Cells rely- 
ing on this strategy exclude salts from their cytoplasm 
as much as possible. The high concentrations of organic 
"compatible" solutes do not greatly interfere with nor- 
mal enzymatic activity. Fewer adaptations of the cells' 
proteome are therefore needed. Such organisms can 
often adapt to a broad salt concentration range [6]. 

Halophilic proteins from "salt-in" organisms are dis- 
tinguished from their non-halophilic homologous pro- 
teins by exhibiting remarkable instability in low salt 
concentration and by maintaining soluble and active 
conformations in an environment generally detrimental 
to other proteins. Indeed, hypersaline conditions favor 
protein aggregation and collapse, interfere with the elec- 
trostatic interactions between protein residues, and are 
responsible for a general decrease in the availability of 
water molecules [9]. Halophilic proteins, rather than 
being unfolded by these conditions, appear to be depen- 
dent on the presence of salts. In recent years, detailed 
investigations have tried to unveil the relationships 
between structure and stability in halophilic proteins 
[10]. In particular, these studies suggested that the halo- 
philic proteins bind significant amounts of salt and 
water in solvent conditions similar to their physiological 
environment. The peculiar ability of halophilic proteins 
to bind large amount of salts is largely dependent on 
the number of acidic amino acids on protein surface 
[11-17]. The role of electrostatic interactions in the sta- 
bility and folding of halophilic proteins has been investi- 
gated and recognized as a key determinant of 
haloadaptation [18,19]. Moreover, it was observed that 
halophilic proteins are characterized by a general 



decrease in hydrophobic amino acid frequency and a 
greater propensity to form random-coil structures, 
rather than a-helices [15]. Indeed, protein folding and 
adequate stability of the native structure in a hypersaline 
environment may require evolutionary modulation of 
the hydrophobic interactions occurring at the protein 
core. Most of the studies carried out to unveil the struc- 
tural characteristics of halophilic proteins, were based 
on sequence comparison at proteome and genome levels 
or were focused onto single or few protein families. 
These analyses provided undoubtedly valuable indica- 
tions on the biophysical and biochemical properties of 
the halophilic proteins. However, scrutiny of proteome 
and genomic sequences may not unravel subtle differ- 
ences at the three-dimensional structural level while 
structural analysis of a single or few protein families 
may lack sufficient generalization. For these reasons we 
report in this work a systematic comparison between 
the available three-dimensional structures of halophilic 
enzymes deposited in the data banks and the structure 
of one of their homologues, to investigate the differ- 
ences possibly related to shared strategies of structural 
adaptation to high salt environments. Use of three- 
dimensional structure made it possible to investigate 
subtle modifications of the surface and hydrophobic 
core of the halophilic proteins especially at the level of 
conserved hydrophobic contacts [20]. 

Results 

Data set 

A set of 15 halophilic enzymes, 9 of which from 
"extreme" halophiles and 6 from "halotolerant" organ- 
isms, was collected along with their non-halophilic 
structural homologues (Table 1). Among the halophilic 
enzymes, 8 come from Archaea that adopt the "salt-in" 
strategy (SALTIN), and 7 from Eubacteria that adopt 
the "osmolytes" strategy (OSMOL). In this paper, the 
entire analysis was carried out considering the two 
groups separately. 

Four pairs of halophilic enzymes (PDB ID: 2B5W, 
2J5K, 1NML, 3IGN) have a quaternary structure differ- 
ent from their non-halophilic homologs (Table 1) and 
therefore were excluded from the calculations involving 
proteins in their quaternary structure, such as calcula- 
tion of the electrostatic potential surface. The pair cyto- 
chrome c 552 from Marinobacter hydrocarbonoclasticus 
(PDB ID: 1CNO) and from Pseudomonas stutzeri (PDB 
ID: 1ETP) has been included in the analysis, for the rea- 
sons explained in Methods section. 

Accessible surface area and electrostatic potential 

To calculate the relative solvent accessibility (ASA) and 
the magnitude of the relative polar and apolar compo- 
nents, protein structures were used in their quaternary 
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Table 1 Data set. 
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List of protein pairs utilized in the work. Boldfaced PDB codes indicate extreme halophiles. 

a) Protein Data Bank code 

b) Number of monomers in the biological unit 

c) Source organisms. (A) = Archaea; (B) = Bacteria 

d) Optimal range of NaCI concentration for growth. The first 9 lines contain structures from extreme halophiles; the remaining 6 lines report moderate halophiles. 

e) The survival strategy adopted by corresponding organism: "salt-in" or "osmolytes" strategy 
R Crystallographic resolution 

9) Protein name as reported in the Protein Data Bank 
h) Sequence percentage of identity between protein pairs 



form. The surface polar area formed by oxygen and 
nitrogen atoms was calculated. The area subset formed 
only by side-chain oxygen or nitrogen atoms (namely, 
excluding atoms belonging to the polypeptide backbone) 
was also considered. 

In SALTIN enzymes, the fractional apolar component 
of the relative solvent accessible surface is significantly 
smaller than the corresponding area in non-halophilic 
counterparts (zlApA in Table 2). The polar area formed 
by oxygen atoms is significantly larger in halophilic 
enzymes; the same result is obtained when the analysis 
is repeated excluding backbone oxygen atoms and 
including only side-chain oxygen atoms. Likewise, the 



polar area formed by nitrogen atoms is significantly 
smaller than the corresponding area in non-halophilic 
counterparts; the same result is obtained when the ana- 
lysis is repeated without backbone nitrogen atoms and 
including only side-chain nitrogen atoms. These differ- 
ences are reflected in the surface electrostatic potential, 
which is significantly more negative in halophilic 
enzymes than in the non-halophilic counterparts (Table 
3). In OSMOL sample, the apolar component of the 
relative solvent accessibility was also significantly smaller 
than the corresponding area in non-halophilic counter- 
parts (Table 2); however, the decreased surface is caused 
solely by the increase of the polar area formed by side- 
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Table 2 4ASA in the SALTIN and OSMOL samples. 
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Differences of fractional accessibility surface area (MSA) in the SALTIN and OSMOL samples for different class of atoms. The differences are between the surface 
areas calculated in the halophilic protein and the corresponding areas in the non-halophilic counterpart. The calculations were performed considering the 
proteins in their quaternary structure. 

a) Apolar MSA difference between fractional apolar exposed areas of the halophilic protein and the corresponding non-halophilic homolog 

b) Oxygen atom fractional MSA 

c) Side-chain oxygen atom fractional MSA 

d) Nitrogen atom fractional MSA 

e) Side-chain nitrogen fractional MSA 

f) Total fractional MSA 

g) Average fractional MSA 

h) Boldfaced and underlined p-values indicate significant or possible trend, respectively 



chain oxygen atoms. In fact, there is no significant dif- 
ference in the areas formed by oxygen atoms or nitro- 
gen atoms. The analysis was repeated considering only 
the subset of residues identically conserved between the 
halophilic and the corresponding non-halophilc homolo 
gue in the SALTIN and OSMOL samples. In this case, 
no significant difference was seen in the exposed areas 
(see additional file 1). 

Surface electrostatic potential displays no significant 
difference although, in 3 out of 5 pairs, it is more nega- 
tive in the OSMOL halophilic proteins (Table 3). 



Apolar contact area 

The apolar contact area (ACA) between residues was 
calculated for three different structural regions namely 
the core, the interface between the monomers and the 
protein surface. Residues were assigned to the various 
structural regions according to the alternative accessibil- 
ity thresholds described in the Methods section. 

ACA in the core of SALTIN halophilic proteins is not 
significantly different, according to the statistical tests, 
from the ACA of their homologous counterparts. How- 
ever, it is consistently smaller (Figure 1). This trend is 



Table 3 4AAP for SALTIN and OSMOL samples. 
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Differences of average atomic potential (MAP) for SALTIN and OSMOL samples. Units are kT/e. 
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b) Average MAP 

c) Boldfaced digits indicate significant p-values 
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Figure 1 core ^ACA. Histograms reporting the core MCA in the 
SALTIN (upper panel) and OSMOL (lower panel) samples at different 
solvent accessibility thresholds (thresholds and relative grey codes 
are reported in the box in the upper right corner of the figure). 
Residues with accessibility less than the thresholds are considered 
during calculation of the differences. PDB IDs of the halophilic 
protein are reported on the horizontal axes. 



more pronounced when the core contains only residues 
at 0% accessibility. At variance, OSMOL enzymes have 
an ACA that is globally comparable with that of non- 
halophilic counterparts (Figure 1), even at 0% accessibil- 
ity threshold. 

Likewise, ACA in the SALTIN monomer interface is 
not significantly dissimilar from the ACA of their non- 
halophilic counterparts. However, even in this case, all 
the zlACAs were negative except for the pair 3IFV- 
1RWZ. The analysis at interface for OSMOL enzymes is 
reduced to two pairs and is therefore of no statistical 
interest, however, the ACA formed by interfacial resi- 
dues in halophiles is comparable to that of non-halophi- 
lic organisms. 

The surface ACA in SALTIN and OSMOL halophilic 
enzymes is not significantly different from the corre- 
sponding surface in the non-halophilic counterparts 
(Figure 2). 

Differences in apolar area buried upon folding 
(AAApA v _ ¥ ) between each halophilic and non-halophilic 
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Figure 2 surface ^ACA. Histograms reporting the surface AACA in 
the SALTIN (upper panel) and OSMOL (lower panel) samples at 
different solvent accessibility thresholds (thresholds and relative grey 
codes are reported in the box in the upper right corner of the 
figure). Residues with accessibility greater than the thresholds are 
considered during the calculation of the differences. PDB codes of 
the halophilic protein are reported on the horizontal axes. 



pair, have been calculated. According to the Wilcoxon 
and the £-test, none of the average values were signifi- 
cantly different from 0. Distribution of the differences in 
the SALTIN and OSMOL samples are reported in addi- 
tional file 2. 

Conserved hydrophobic contact area 

All 15 structures collected from halophilic organisms 
have been structurally aligned with the non-halophilic 
counterparts, and the structurally conserved regions 
(SCR) were calculated. The conserved contacts 
between hydrophobic residues (CHC) were identified 
for each pair of structures, and were selected only 
those formed by evolutionary conserved residues. Then 
for each CHC, the apolar contact area formed by the 
pair of residues was calculated. The number of CHCs 
selected varies for each enzymatic family depending on 
the relative degree of conservation of residues in that 
family (Table 4). 
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Table 4 Number of CHCs found in the two samples with the corresponding AACA. 



SALTIN 
HALOPHILE 


Seq. 
lenght 


NON 
HALOPHILE 


Seq. 
lenght 


Sequence 
identity(%) 


No. 
CHC 


OSMOL 
HALOPHILE 


Seq. 
lenght 


NON 
HALOPHILE 


Seq. 
lenght 


Sequence 
identity(%) 


No. 
CHC 


1D0I 


128 


1FXA 


98 


51 


50 


1NWZ 


125 


1 MZU 


129 


46 


55 


1TJ0 


182 


2VXX 


192 


36 


36 


1CNO 


87 


1ETP 


190 


44 


26 


2B5W 


357 


2CD9 


366 


30 


154 


1NML 


326 


3HQ6 


345 


47 


157 


2CC6 


68 


2V18 


68 


42 


33 


2VPN 


316 


3FXB 


326 


64 


132 


1ITK 


731 


2FXG 


748 


60 


514 


3IGN 


177 


3I5C 


206 


62 


63 


2AZ3 


164 


3B54 


161 


54 


71 


3BSM 


413 


2QJJ 


402 


40 


187 


2J5K 


304 


1Y6J 


318 


33 


101 


3IBM 


167 


3KGZ 


156 


66 


65 


3IFV 


247 


1RWZ 


245 


36 


104 














Total a) 
Average b) 

t-test c) 
Wilcoxon c) 










329.33 
-0.31 
0.03 
0.01 












58.56 
-0.08 
0.52 
0.63 



Number of conserved hydrophobic contacts (CHCs) found in the two samples and the corresponding overall difference of apolar contact area (MCA) 

a) Total MCA in the CHCs 

b) Average MCA in the CHCs 

c) Boldfaced digits indicate significant p-values 



In SALTIN sample the average ACA in CHCs is sig- 
nificantly smaller in halophilic enzymes than non-halo- 
philic counterparts (Table 4). At variance, OSMOL 
sample displays no significant difference. It has been 
also tested whether the differences of the CHC areas in 
the halophilic and non-halophilic counterparts corre- 
lated with sequence identity. No correlation was found 
(see additional file 3). 

Statistics on the preferred amino acid exchanges at the 
SALTIN CHCs indicate that He Val is the most fre- 
quent replacement found in the direction mesophile — » 
halophile with a Z-score equal to 4.34. At variance, the 
most frequent exchange in the OSMOL CHCs is Leu — » 
He with Z-score 3.74. 

Discussion 

Halophilic proteins have been extensively studied since a 
long time [10] and very much is known about their dis- 
tinguishing characteristics, although several aspects not 
sufficiently elucidated of their molecular adaptation 
mechanisms still remain. Indeed, most of the published 
studies relied on comparative analyses of the character- 
istics of halophilic proteome and genomic sequences 
that provided general overview of the molecular signa- 
ture of haloadaptation [15] but may not unravel subtle 
differences at the three-dimensional structural level. 
Likewise, three-dimensional structural comparisons were 
generally carried out within a single or a few protein 
families which may not be sufficiently representative of 
the halophilic proteome. Therefore, our work was aimed 
at contributing to the understanding of the shared mole- 
cular mechanisms of protein haloadaptation through a 
detailed and systematic comparison of a larger sample 
of several three-dimensional structures of halophilic and 



non-halophilic proteins now available in the databanks. 
This approach highlights the adaptation strategies 
shared by the extreme halophilic proteins, besides the 
variations related to the peculiarities of each single 
family. Indeed, the extent to which individual proteins 
adapt to halophilic conditions varies, presumably due to 
their diverse characteristics and roles within the cell. 
For example, the malate dehydrogenase from Salinibac- 
ter ruber recently characterized [21] shares characteris- 
tics of a haloadapted archaeal enzyme and of non- 
haloadapted enzymes from other eubacterial species. 
Moreover, a proliferating cell nuclear antigen from 
Haloferax volcanii [16] displayed an unexpectedly low 
number of ion pairs at the monomer-monomer 
interface. 

Proteins were taken both from microorganisms 
(mostly extreme halophilic Archaea) that utilize the 
"salt-in" strategy to cope with the strong environmental 
osmotic pressure, and from halophilic microorganisms 
using the intracellular accumulation of osmolytes and 
low cytoplasmatic salt concentration to counterbalance 
the external osmotic pressure. Proteins synthesized by 
the "salt-in" organisms are surrounded by a high-salt 
concentration environment and are in contact with it. 
At variance, intracellular proteins from the "osmolyte" 
cells are surrounded by a high concentration solution of 
compatible solutes which apparently do not interfere 
significantly with protein solubility, stability and activity 
[22]. Structural differences observed between the "salt- 
in" and the mesophilic homologous proteins (this set 
was referred to as SALTIN sample) were contrasted to 
those observed between the "osmolyte" and mesophilic 
pairs (here called OSMOL sample). The reasons for this 
comparative approach is twofold: the OSMOL sample 
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was the reference sample against which to test the struc- 
tural differences observed in the SALTIN sample; at the 
same time we verified the presence of possible molecu- 
lar adaptations in the OSMOL proteins due to their 
immersion into the osmolyte solution. As a further filter 
to eliminate possible noise due to non-normality distri- 
bution or paucity of data, two statistical tests (the para- 
metric £-test and the non-parametric Wilcoxon test) 
were used to assess the significance of observed differ- 
ences of structural parameters. Only differences defined 
significant according to both tests were deemed bona 
fide significant signals. 

Starting from the surface features of the halophilic 
proteins, we demonstrated a significant reduction of 
apolar surface in the SALTIN sample and a parallel 
increase of the polar area formed by the side chain oxy- 
gen atoms, while the nitrogen side chain atoms lessen 
their contribution to the surface. These results reflect 
the well-known increase of surficial aspartate and gluta- 
mate residue frequency combined to the decrease of 
lysine residues [15,23]. It should be mentioned indeed 
that a reduction of hydrophobic exposed surface in the 
glucose dehydrogenase from Haloferax mediterranei has 
been demonstrated [24]. In this case however, reduction 
of apolar surface was caused by reduction of lysine resi- 
dues. Thus, overall, the 2-fold reduction in the propor- 
tion of lysine residues in the sequence leads to a 4-fold 
reduction in the exposed hydrophobic accessible surface 
area contributed by the associated alkyl component of 
the lysine side chain. 

We noticed that a significant decrease of apolar sur- 
face area in the protein synthesized by the halophilic 
organism can be observed in the OSMOL sample as 
well, paralleled by an ostensible tendency to increase the 
presence of side chain oxygen atoms (Table 2). Most of 
the compatible osmolytes produced by halophilic pro- 
karyotes are neutral species, which contain charged 
groups (for example amino acid derivatives); therefore, it 
is not surprising that maintenance of solubility and sta- 
bility of proteins at decimolar osmolyte concentration, 
required surface remodeling partly similar to that 
observed in the SALTIN sample, implying decrease of 
apolar surface. Indeed, several physical-chemical models 
have been proposed in the literature, according to which 
the charged sites of the stabilizing osmolytes interact 
with the oppositely charged polar areas of the macromo- 
lecule surface [25]. On the contrary, stabilizing osmo- 
lytes avoid contact with the polypeptide backbone. This 
property is called "osmophobic effect" [26] and it forces 
protein to fold in vivo, complementing the well known 
hydrophobic effect. However, the overall extent of sur- 
face modification is much smaller than that observed in 
the extreme halophilic proteins. Indeed, the examination 
of the surface atomic potential variations in the SALTIN 



sample (Table 3) confirms that the electrostatic potential 
is significantly more negative in halophilic proteins 
rather in their mesophilic counterparts. Negative surface 
electrostatic potential has been demonstrated to be a 
characteristic factor of haloadaptation like, for example, 
in nucleoside diphosphate kinase from Halobacterium 
salinarum [27] or in an esterase from Haloarcula maris- 
mortui [19]. Thermal stability of the former protein 
indeed resulted to be strongly dependent on salt con- 
centration, as predicted by theoretical studies [18]. The 
picture is different in the case of the OSMOL sample: 
the increase of the surface formed by side-chain oxygen 
atoms is not coupled to the decrease of nitrogen atoms: 
this account for the lack of an apparent modification of 
surficial potential. Recently, it has been demonstrated 
[17] by extensive site-directed mutagenesis on three pro- 
tein domains (the halophilic 1A domain of the NAD- 
dependent DNA ligase N from Haloferax volcanii, the 
homologous domain from E.coli, and the mesophilic IgG 
binding domain of the protein L from Streptococcus 
magnus) that halophilicity of these proteins is directly 
related to a decrease in the solvent accessible surface. 
Authors stated that reduction of the solvent accessible 
area introduced upon mutation causes a progressive 
destabilization of the molecule, probably due to a reduc- 
tion in the protein's hydrophobic effect. As a conse- 
quence, mutations increasing salt induced stabilization 
also destabilize the protein in the absence of salt, con- 
verting a mesophilic protein into an obligate halophilic 
form, a trend found in natural halophilic proteins. 

However, we did not observe any significant reduction 
of the accessibility surface (data not shown) in the 
SALTIN and OSMOL halophilic proteins whereas we 
noticed the tendency, in the halophilic proteins of the 
SALTIN samples, to reduce the apolar contact area of 
the residues exposing more than 25% of their surface. 
Therefore one could argue that the reduction of hydro- 
phobic effect be a key component of the haloadaptation 
mechanisms of surfaces. 

However, surface property variations are not the only 
feature relevant to protein stability and solubility in high 
salt environments. Indeed, proper folding of the protein 
in these conditions requires the formation of appropri- 
ate hydrophobic interactions in the interior of the 
macromolecule. This is supposedly even more important 
during the early step of protein folding of the nascent 
polypeptide chain. In this perspective view, we analyzed 
the variations of apolar contact area in the hydrophobic 
core of the halophilic SALTIN and OSMOL surface 
compared to their mesophilic counterparts. No statisti- 
cally significant difference could be found. Nonetheless, 
it should be noted that a consistent decrease of the apo- 
lar contact area in the halophilic proteins of the 
SALTIN sample can be observed for core residues at 0% 
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relative accessibility (Table 4). We believe this is indica- 
tive of a trend, absent in the OSMOL sample, related to 
haloadaptation of the SALTIN proteins. Similar trend 
can be observed at the subunit interfaces of extreme 
halophilic proteins. 

Differences in apolar area buried upon folding 
(AAApA v _ ¥ ) between each halophilic and non-halophilic 
pairs display overall no significant difference. This may 
suggest that, globally, the fraction of apolar area lost 
during folding is similar in the two cases (halophilic/ 
non-halophilic) although the apolar surface available to 
the halophilic protein is smaller. This observation is 
reflected and supported by the significant reduction of 
exposed apolar area (Table 2) and apolar contact area 
(Table 4). It should be mentioned however that the diffi- 
cult definition of unfolded state of a protein [28] should 
recommend caution in the interpretation of these 
results. Moreover, in this specific problem, electrostatic 
repulsion of all the negative charges can render the cal- 
culation of the solvent accessible surface area of 
unfolded state even more inaccurate. 

It is well known that high NaCl concentration 
strengthens hydrophobic interactions [29] and recently 
the thermodynamics of this effect has been analyzed in 
synthetic polymers [30]. Hydrophobicity is the main 
driving force of protein folding [31]; it must be finely 
balanced to confer proteins proper flexibility and stabi- 
lity compatible with their function. To maintain the 
magnitude of hydrophobic interactions within useful 
intervals in a high salt environment, extent of protein 
interior hydrophobic surface should be reduced in order 
to lessen the change in solvated hydrophobic areas. 
Interestingly, it was demonstrated [30] that a potent 
osmolyte, trimethylamine oxide, have a negligible effect 
on the strength of hydrophobic interactions. This find- 
ing suggests that hydrophobic interactions in proteins of 
"osmolyte" microorganisms need not to be significantly 
altered. 

This picture is even clearer when inspecting and con- 
trasting the conserved hydrophobic contacts in the pro- 
teins belonging to the SALTIN and OSMOL sample. 
The CHCs [18] are sites in the interior of the protein in 
correspondence of which there is a hydrophobic contact 
conserved during divergent evolution. Evolutionary con- 
servation suggests that the contact be essential for fold- 
ing and/or stability of the protein. The average apolar 
contact area at the CHCs is significantly smaller in halo- 
philic proteins than in their mesophilic counterparts. As 
expected, OSMOL sample display no difference (Table 
4). Accordingly, the most frequent residue exchange in 
the SALTIN CHCs is the replacement of He with Val in 
halophilic proteins which reduces the hydrophobic 
volume buried in the core. Indeed, the effect of such 
substitution is the loss of a methyl group in the 



hydrophobic nucleus of the protein. This is expected to 
reduce the hydrophobic interactions, although confor- 
mational strain can also compound the influence on the 
stability of the protein [32]. As expected, the most fre- 
quent residue substitution of the OSMOL CHCs, 
namely Leu^ He, does not imply any reduction in the 
number of side-chain carbon atoms. As an example, Fig- 
ure 3 reports two cases taken from the dataset utilized 
in the work in which a CHC He was replaced by Val in 
the halophilic protein. 

Conclusions 

The present analysis suggests that a shared haloadapta- 
tion strategy of proteins in the presence of molar salt 
concentration, but not in the presence of osmolytes, 
necessitates a weakening of the hydrophobic interactions 
in general, and in particular at the level of core and con- 
served hydrophobic contacts. Weakening of these inter- 
actions counterbalances their strengthening by the 
presence of salts in solution and may help the structure 
preventing aggregation and/or loss of function in hyper- 
saline environments. Indeed, decreasing hydrophobicity 
makes halophilic proteins unstable in low-concentration 
salt solutions and may in part explain the request of the 
halophilic proteins for high salt concentrations. To com- 
plete the picture, the destabilization of halophilic pro- 
teins at low-salt concentration due to the strong 
electrostatic repulsion should be considered [18,33]. 
Shrinking of hydrophobic contacts must be even more 
critical for the early stages of folding when intramolecu- 
lar hydrophobic nuclei must correctly form to guide the 
polypeptide through the folding funnel to the native 
state. 

Considering also the significant increase of biotechnol- 
ogy applications of halophiles, the comprehension of the 
multifaceted etiology of halophilicity (including the elec- 
trostatic factors) can provide the theoretical basis for 
the engineering of proteins of great interest because 
stable at concentrations of salts that cause the denatura- 
tion or aggregation of the majority of macromolecules. 

Methods 

Selection of protein structures 

The crystallographic structures of the available high salt 
concentration active enzymes were found in PDB [34]. 
The search was carried out using the keywords: "halo- 
phil", haloarc", "halobacter" and the like. For each of the 
obtained structures, it was verified that the source 
organism was present in a previously compiled list of 
halophilic organisms, based on the website XBASE [35] 
http://www.xbase.ac.uk/ and in the literature. The 
selected organisms have optimal growth conditions that 
support the presence of genuine adaptation to high salt 
concentration environments. In fact, those organisms 
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Figure 3 Examples of residue substitutions decreasing the area of the conserved hydrophobic contacts. A) Superposition of two 
equivalent CHCs in the halophilic glucose dehydrogenase from Haloferax mediterranei (color orange, PDB ID: 2B5W) and its counterpart from 
Sulfolobus solfotoricus (color light blue, PDB ID: 2CD9). Secondary structures are represented as cartoon and residues involved in the apolar 
contact are shown as sticks models. Halophilic Val30 replaces Ile31 of the mesophile. The other contacting residue, Leu, is conserved in both 
proteins. B) Superposition of two equivalent CHCs in the halophilic malate dehydrogenase from Haloarcula marismortui (color magenta, PDB ID: 
2J5K) and its mesophilic counterpart from Clostridium thermocellum (color cyan, PDB ID: 1Y6J). Secondary structures are represented as cartoon 
and residues involved in the apolar contact are shown as sticks models. Halophilic Val27 and Val94 replace Ile13 and Ile78 of the counterpart, 
respectively. 



requiring for optimum growth concentration of NaCl < 
0.5 M and/or temperatures < 20°C or > 55°C, were 
excluded from the data sample. Moreover, the organ- 
isms were divided into two groups corresponding to dif- 
ferent survival strategies: in the first group are the 
archaeal halophiles using the "salt-in" strategy (which 
will be referred to as SALTIN sample); in the latter the 
Eubacteria using the "osmolytes" strategy (OSMOL 
sample) [36]. 

To select only halophilic structures matching specific 
criteria and to eliminate redundancy, we used the 
PISCES server http://dunbrack.fccc.edu/PISCES.phpl37], 
which enables the creation of a list of structures that 
meet arbitrary thresholds, starting from an initial user- 
defined larger sample. The parameters used were: maxi- 
mum percentage identity 90%, maximum resolution 2.5 
A, maximum R-value 0.3, minimum chain length 50 
residues, and utmost chain length 1000 residues. Homo- 
logous structures from non-halophilic organisms were 
retrieved from PDB by means of the program PSI- 
BLAST [38]. To ensure structural homology, only 
sequences sharing > 30% and < 90% residue identity to 
the halophilic counterpart were considered. In cases 
where more counterparts were available for each halo- 
philic protein, only the homolog with the highest per- 
centage identity to the protein of interest was 
considered to minimize possible phylogenetic drift 



effects on the statistics. At the end of the procedure, 15 
pairs of halophilic and non-halophilic proteins were col- 
lected, 8 of which corresponded to organisms that adopt 
the "salt-in" strategy and constitute the SALTIN sample, 
and 7 that adopt the "osmolytes" strategy and compose 
the OSMOL sample. In cases where the protein struc- 
tures are used in their biological unit form, the pairs are 
reduced to 11 because in 4 cases there is no correspon- 
dence between the quaternary structure of homologous 
halophilic and non-halophilic proteins. Although Table 
1 shows that the two proteins of the pair 1CNO-1ETP 
have a different quaternary structure, this pair was 
included in the analysis. In fact, since 1ETP is a fusion 
protein of two polypeptide chains each equivalent to the 
monomer of 1CNO, the two quaternary forms are over- 
all similar. 

Apolar contact area (ACA) 

The apolar contact area is the area of contact between 
coupled non-polar atoms in a protein. It was calculated 
for different structural environments: nucleus, interface 
between monomers and protein surface. The program 
utilized was Pdb_np_cont [39], which calculates the area 
of contact between coupled non-polar atoms starting 
from a standard PDB file. Briefly, this method is based 
on the classification of points located on a sphere of 
interaction radius, surrounding each non-polar atom. 
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The interaction radius is the van der Waals radius of 
each atom type, plus the radius of a water molecule. 
The output of this program was utilized to calculate the 
pairwise residue contact areas for every possible pair of 
residues belonging to the structures analysed. 

Individual protein chains were used for analysis of the 
protein core, while analyses of the protein subunit inter- 
faces were carried out with structures in their quatern- 
ary assembly. Only the 9 pairs of proteins possessing an 
oligomeric biological unit were used in this case. Protein 
interfaces consists of those residues making apolar con- 
tacts to another protein chain, as defined by 
Pdb_np_cont [38]. 

A residue was assigned to the structural environment 
"nucleus" if its relative solvent accessibility, calculated 
with the program NACCESS [40], is less than four dif- 
ferent alternative thresholds (0%, 3%, 5% and 9%). A 
residue is instead assigned to the protein surface if its 
relative solvent accessibility is greater than an arbitrarily 
fixed threshold. To this end, four different alternative 
thresholds (25%, 40%, 55% and 70%) were used. 

For each structural environment considered the per- 
residue mean apolar contact area was calculated by 
dividing the total apolar contact area by the number of 
residues belonging to that structural environment. 

Conserved hydrophobic contacts (CHCs) were identi- 
fied in each pair of homologous enzymes through the 
combined use of the web tools CE-MC [41] for calculat- 
ing the structural alignment, SCR_FIND for identifica- 
tion of structurally conserved regions and CHC_FIND 
for the detection of the CHCs and their apolar contact 
area [42]. SCRs were defined as regions displaying a 
similar local conformation, with a mean positional 
RMSD of the equivalent a -carbon positions of the struc- 
tures superposed < 3.0 A, lacking indels and composed 
of at least three consecutive residues. A CHC is defined 
as a conserved hydrophobic contact formed between 
two residues belonging to an SCR, in members belong- 
ing to a family or superfamily of proteins [42], We con- 
sidered only those contacts that are formed between 
residues distant at least three positions along the 
sequence and residues that are evolutionarily conserved 
according to the server CONSURF [43], i.e. belonging to 
the classes 7, 8 and 9. These three classes contain the 
most conserved amino acid positions from a total of 
nine equally sized categories of relative degree of 
conservation. 

Preferred amino acid substitution in CHCs 

Amino acid substitutions of residues involved in the for- 
mation of structurally and evolutionarily conserved 
hydrophobic contacts between halophilic and non-halo- 
philic proteins were determined by analysing the align- 
ment of the SCRs of each pair. For each residue X, 



belonging to a non-halophilic protein and involved in 
making CHCs, aa x ^y was defined as the number of 
times X is substituted by the residue Y of the halophilic 
protein. Likewise, aa Y ^x is defined. Therefore, a substi- 
tution matrix can be obtained by computing the differ- 
ence between aa x ^y and aa Y ^x over the whole dataset 
of protein pairs /<, according to: 

k 

where C s is the element of the substitution matrix. 

The mean and standard deviation of the overall 
exchange matrix were determined; the significance R X y 
of the exchange X^Y was then calculated by dividing 
the difference between C 5 , and the overall matrix mean 
C by the standard deviation a: 

R XY = (2) 
a 

R X y values > 3.0 standard deviations (corresponding to 
a probability P < 0.01 that the observed difference was 
obtained by chance) from the mean value were consid- 
ered statistically significant [18]. 

Accessible surface area (ASA) analysis 

The ASA is the area of the protein surface which can 
be in contact with the solvent. The total apolar (and 
the complementary polar) component of the entire 
accessible surface area of each protein in its quaternary 
form was calculated with the server GETAREA http:// 
curie.utmb.edu/getarea.html with default settings [44]. 
The contribution of different atomic types to the polar 
area, namely oxygens, side-chain oxygens, nitrogens 
and side-chain nitrogens was also investigated. To 
compare the different surface areas of halophilic and 
non-halophilic proteins, they were normalized by divi- 
sion by the total accessible surface area of the corre- 
sponding protein. 

The differences between the fraction of apolar accessi- 
ble surface area in the unfolded and folded form of each 
protein considered (zlApAu_ F ) were calculated through 
the web server http://roselab.jhu.edu/utils/unfolded.html 
[28]. The differences between the zlApAu_ F in each 
halophilic protein and in its corresponding homolog 
(AAApA v _Y) were calculated and statistically tested. 

Electrostatic potential calculation 

The surface electrostatic potential of the proteins in 
their quaternary form was calculated using the program 
DELPHI [45]. Salt concentration was set equal to 0, 
since identical environmental conditions can better 
delineate differences between the electrostatic potential 
of the halophilic and non-halophilic homolog. Internal 
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and solvent dielectric constants were set to 4.0 [46] and 
80.0 respectively. The other parameters used were set to 
the default values: grid scale = 1.2, box fill = 60%, probe 
radius = 1.4 A, and van der Waals surface. To compare 
the potential of halophilic and non-halophilic proteins 
of different lengths, the average atomic potential (AAP) 
was calculated dividing the total electrostatic potential 
by the total number of atoms. 

Statistical tests 

Whenever possible, differences between the structural 
properties considered were calculated within the two 
samples: the SALTIN halophilic and non-halophilic 
homologs, and the OSMOL halophilic and non-halophi- 
lic pairs. Differences of structural properties are denoted 
by A: for example, AACA indicates halophilic minus 
non-halophilic apolar contact areas. 

The As between the structural properties of halophilic 
enzymes and their corresponding non-halophilic coun- 
terparts were tested within samples using two statistical 
tests, a paired £-test and a non-parametric Wilcoxon 
signed-rank test [47]. In the former case the null 
hypothesis is that the average A was 0 at 0.05 ^-value 
while in the latter case the null hypothesis is that the 
median of the As was 0 at 0.05 Rvalue. The parametric 
£-test assumes that the tested data come from a normal 
distribution, while the Wilcoxon test it is less restrictive 
since it does not require such a condition. To enhance 
robustness of the conclusions drawn from the structural 
comparisons, only differences resulting significant from 
both tests were deemed bona fide significant results. 

Additional material 



List of abbreviations 

A: prefix indicating the difference between the property measured in the 
halophilic and in the non-halophilic homologous proteins; AAP: average 



atomic potential; ACA: apolar contact area; ApA: apolar accessible area; 
Z\ApAu_ F : difference between ApA in the unfolded and folded state; ASA: 
solvent accessible surface area; CHC: conserved hydrophobic contacts; 
OSMOL: halophilic organisms adopting the "osmolytes" strategy; RMSD: Root 
mean square deviation; SALTIN: halophilic organisms adopting the "salt-in" 
strategy; SCR: structurally conserved region. 
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